Growth-defense trade-off article analyses

Author

Julia Harenčár

Published

March 29, 2023

Analyses of herbivory and chemistry data for the article, “Growth-defense trade-offs facilitate species coexistence: empirical evidence from recently diverged tropical plants.”

Loading packages and setting defaults:

Code
# analysis
library("DHARMa")
library("car")
library("lme4")
library("glmmTMB") 
# plotting
library("tidyverse")
library("ggplot2")
library("patchwork")
library("RColorBrewer")
theme_set(theme_bw())

Wild herbivory

Analyses comparing wild herbivory between Costus villosissimus and C. allenii.

Presence v absence

Analysis comparing the presence or absence of herbivory on wild plants from both 2019 and 2022.

First, we read in the 2019 and 2022 data, create presence/absence columns for each, and combine them into a single dataframe.

Code
# 2019 data
Wherb19 <- read.csv("vill-all_RLB-herbivory_data_2019-Herbivory.csv", header=TRUE)
# create presence/absence of herbivory column
Wherb19 <- Wherb19 %>% mutate(pa = case_when(
  per_herbivory == 0 ~ 0,
  per_herbivory != 0 ~ 1
))

# 2020 data
Wherb22 <- read.csv("2022_wild_herbivory.csv", header = T)
# create presence/absence of herbivory column
Wherb22 <- Wherb22 %>% 
  mutate(pa = case_when(
  num_w_RLBH == 0 ~ 0,
  num_w_RLBH != 0 ~ 1
)) %>% dplyr::select(date, spp, id, pa)

# making combined 2019/2022 dataset
Wherb19$year <- '2019'
Wherb22$year <- '2022'
names(Wherb19) <- c("date","ID", "spp", "ave_per_herb", "pa", "year")
Wherb_full <- Wherb22 %>% 
  dplyr::select(spp,pa,year) %>% 
  full_join((Wherb19 %>% dplyr::select( "spp", "pa", "year")),.) 

Next, we evaluate the potential impacts of plant species and year on the presence vs absence of herbivory with a binomial regression. We first check to ensure the model is not under- or over-dispersed.

Code
# binomial regression 
bin_reg <- glm(Wherb_full$pa ~ Wherb_full$spp + Wherb_full$year, family = "binomial")

# Check over or under-dispersion with the package DHARMA. 
simOut <- simulateResiduals(bin_reg)
plot(simOut) # dispersion looks good

Code
testDispersion(simOut) # dispersion = 1.0241, p-value = 0.808; not overdispersed


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 1.0234, p-value = 0.776
alternative hypothesis: two.sided

The model is not overdispersed! :D

Now we look at the results of the regression and plot wild herbivory by year and species.

Code
# binomial regression summary
summary(bin_reg)

Call:
glm(formula = Wherb_full$pa ~ Wherb_full$spp + Wherb_full$year, 
    family = "binomial")

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4029  -0.5089   0.3387   0.3387   2.0535  

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)           0.5860     0.2836   2.066   0.0388 *  
Wherb_full$sppvill   -4.8085     0.5409  -8.890  < 2e-16 ***
Wherb_full$year2022   2.2435     0.4931   4.550 5.37e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 347.06  on 250  degrees of freedom
Residual deviance: 160.85  on 248  degrees of freedom
AIC: 166.85

Number of Fisher Scoring iterations: 6
Code
# # code for choosing colorblind friendly colors. 
# library(colorblindcheck)
# palette_check(c("#cc79a7", "#f0e442"), plot = TRUE)

# Making a simple barplot of presence/absence herbivory data from both years
# calculate percentages for plot
PA_barplot <- Wherb_full %>% group_by(spp, year) %>% summarise(present = sum(pa), N = n()) %>% mutate( percent = present/N*100)
# create plot
H_PA <- ggplot(data=PA_barplot, aes(x=year, y=percent, fill=spp)) +
  geom_bar(stat="identity", position = position_dodge()) + 
  scale_fill_manual(values = c("#cc79a7", "#f0e442"), name = "Species",
                    labels=c("alle" = expression(italic("C. allenii")),
                             "vill" = expression(italic("C. villosissimus")))) +
  ggtitle("Field surveyed herbivory") + 
  ylim(0, 100) +
  xlab("year") +
  theme(plot.title = element_text(hjust = 0.5), legend.text.align = 0) +
  ylab('% of individuals with herbivory')
# print plot
H_PA

Code
# # save plot
# ggsave("wild_herbivory_plot.png", H_PA, device = "png", width = 10, height = 7, units = "cm")

We see that both species and year are significant predictors of the presence of herbivory at a < 0.00001 level, with greater herbivory on C.allenii and during the wet year, 2022.

Percent herbivory (subset)

Here we compare the amount of herbivory as a percentage of leaf area between species for a random subset of wild plants in 2019 and 2022. First, we plot the data.

Code
# Read in the 2019 data (refresh)
Wherb19 <- read.csv("vill-all_RLB-herbivory_data_2019-Herbivory.csv", header=TRUE)

# select random subset for herbivory percent analysis
set.seed(1) # use 1 to match paper
Wherb19_subset <- Wherb19 %>% 
  group_by(species) %>% 
  sample_n(10)

# # Non-parametric test of 2019 only
# wilcox.test(Wherb19_subset$per_herbivory ~ Wherb19_subset$species) # W = 90, p-value = 0.0007512
# ## MEANS:  allenii - 2.54; vill - 0.00
# #t.test(Wherb19_subset$per_herbivory ~ Wherb19_subset$species) 
# # t = 3.7367, df = 9, p-value = 0.004649 - differs in signif. use wilcox as appropriate for non-normal data

# Read in the 2022 data 
Wherb22_subset <- read.csv("2022_Wild_Percent_Herbivory_subset.csv", header = T)

# # Non-parametric test of 2022 data only
# wilcox.test(Wherb22_subset$ave_per_herb ~ Wherb22_subset$spp) # W = 96, p-value = 0.0003801
# ## MEANS:  allenii - 3.055  ; vill - 0.1947
# t.test(Wherb22_subset$ave_per_herb ~ Wherb22_subset$spp) 
# # t = 4.873, df = 10.446, p-value = 0.0005718 -> use wilcox as appropriate for non-normal data

# 2019/2022 combined
Wherb19_subset$year <- '2019'
Wherb22_subset$year <- '2022'
names(Wherb19_subset) <- c("date","ID", "spp", "ave_per_herb", "year")
Wherb_full_subset <- Wherb22_subset %>% 
  dplyr::select(spp,ave_per_herb,year) %>% 
  full_join((Wherb19_subset %>% dplyr::select( "spp", "ave_per_herb", "year")),.)

# plotting subset data
# plot of herbivory subset data - probably not for inclusion # 
ggplot(data=Wherb_full_subset, aes(x=year, y=ave_per_herb, fill = spp)) +
  geom_boxplot(outlier.shape=NA) + 
  geom_point(aes(alpha=0.5), position=position_jitterdodge()) +
  scale_fill_manual(values = c("#cc79a7", "#f0e442"), name = "Species",
                    labels=c("alle" = expression(italic("C. allenii")),
                             "vill" = expression(italic("C. villosissimus")))) +
  theme(plot.title = element_text(hjust = 0.5), legend.text.align = 0) +
  ggtitle("Field herbivory subset") +
  xlab("") +
  ylab("percent herbivory")+
  scale_x_discrete(labels=c("alle" = expression(italic("C. allenii")), 
                            "vill" = expression(italic("C. villosissimus"))))

Code
# ggsave("wild_per_herbivory_plot.png", device = "png", width = 10, height = 8, units = "cm")

Here we poke at the data with some stats… Not sure what is best here. All analyses say what is pretty obvious from the plot: percent herbivory clearly varies by species, but likely due to 0s, and does not clearly vary here between years - I think this is an artifact of taking a small subset. Beetles generally seem to eat similar (though highly variable) amounts in each year once they get to a suitable plant - they just find fewer suitable plants in drier years (fewer nice moist rolled leaves).

Code
# INCOMPLETE: some analyses... 
# Testing model assumptions
# visual assesssment
lm <- lm(ave_per_herb ~ spp + year, data = Wherb_full_subset)
par(mfrow=c(2,2))
plot(lm) # not great - definitely not a normal Q-Q plot, unequal variance btw alle and vill

Code
# homoscedasticity statistical test
ncvTest(lm) # bad... linear regression feels like a really bad idea. 
Non-constant Variance Score Test 
Variance formula: ~ fitted.values 
Chisquare = 16.4147, Df = 1, p = 5.0889e-05
Code
summary(lm) # reasonable seeming results... year not popping, but makes sense since it was a weaker effect (though still very strong) in the presence absence data

Call:
lm(formula = ave_per_herb ~ spp + year, data = Wherb_full_subset)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6200 -0.2747  0.0800  0.1474  3.6800 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.6200     0.3843   6.818 4.96e-08 ***
sppvill      -2.7000     0.4437  -6.085 4.81e-07 ***
year2022      0.3547     0.4437   0.800    0.429    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.403 on 37 degrees of freedom
Multiple R-squared:  0.5045,    Adjusted R-squared:  0.4777 
F-statistic: 18.83 on 2 and 37 DF,  p-value: 2.284e-06
Code
# two way anova to look at the impact of species and year on herbivory
aov2 <- aov(ave_per_herb ~ spp + year, data = Wherb_full_subset)
summary(aov2) # species significant but not year. 
            Df Sum Sq Mean Sq F value   Pr(>F)    
spp          1  72.90   72.90  37.030 4.81e-07 ***
year         1   1.26    1.26   0.639    0.429    
Residuals   37  72.84    1.97                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# 2 non-parametric mean comparisons.
wilcox.test(ave_per_herb ~ spp, data = Wherb_full_subset) #W = 372, p-value = 7.095e-07

    Wilcoxon rank sum test with continuity correction

data:  ave_per_herb by spp
W = 372, p-value = 7.095e-07
alternative hypothesis: true location shift is not equal to 0
Code
wilcox.test(ave_per_herb ~ year, data = Wherb_full_subset) #W = 165, p-value = 0.3185

    Wilcoxon rank sum test with continuity correction

data:  ave_per_herb by year
W = 165, p-value = 0.3185
alternative hypothesis: true location shift is not equal to 0

Leaf toughness

We gathered leaf toughness data from wild plants of both species (18 C. allenii and 20 C. villosissimus) using a leaf penetrometer, which measures the pressure required to poke a hole in the leaf. Here we evaluate whether there is a difference in leaf toughness between species.

First, we read in and plot the data:

Code
# read in the data
tough <- read.csv("leaf.toughness.csv", header = TRUE)

# Plot leaf toughness 
ggplot(data=tough, aes(x=species, y=ave)) +
  geom_boxplot(outlier.shape=NA, aes(fill = species)) + 
  scale_fill_manual(values = c("#cc79a7", "#f0e442"), name = "Species",
                    labels=c("alle" = expression(italic("C. allenii")),
                             "vill" = expression(italic("C. villosissimus")))) +
  ggtitle("Leaf Toughness") +
  geom_jitter(alpha=0.6, width=0.15) +
  theme(plot.title = element_text(hjust = 0.5), legend.text.align = 0) +
  xlab("") +
  ylab("leaf toughness (g)")+
  scale_x_discrete(labels=c("alle" = expression(italic("C. allenii")), 
                            "vill" = expression(italic("C. villosissimus")))) + 
  theme(axis.text.x = element_text(hjust = 0.6, colour = "black")) +
  geom_text(x='vill', y=198, aes(label = "p<0.001"), size=2.5)

Code
#ggsave("leaf_toughness_plot.png", device = "png", width = 10, height = 7, units = "cm")

Next, we evaluate the data, find that it best fits a lognormal distribution, and run both a nonpparametric test and t-test on log transformed data.

Code
# subset the data to evaluate alle and vill data distributions separately
alle.tough <- tough[tough$species =="alle",] # subsetting alle toughness data
vill.tough <- tough[tough$species =="vill",] # subsetting vill toughness data
shapiro.test(alle.tough$ave) # not normally distributed (tough leaves spike)

    Shapiro-Wilk normality test

data:  alle.tough$ave
W = 0.85313, p-value = 0.009493
Code
hist(alle.tough$ave, breaks=20)

Code
shapiro.test(vill.tough$ave) # not normally distributed, but sorta close; p-value = 0.0765

    Shapiro-Wilk normality test

data:  vill.tough$ave
W = 0.91415, p-value = 0.0765
Code
hist(vill.tough$ave, breaks=20)

Code
# QQ plots of different fits 
qqp(alle.tough$ave, "norm")

[1]  1 18
Code
qqp(alle.tough$ave, "lnorm") #best 

[1]  1 18
Code
shapiro.test(log(alle.tough$ave))

    Shapiro-Wilk normality test

data:  log(alle.tough$ave)
W = 0.9019, p-value = 0.06202
Code
qqp(vill.tough$ave, "norm")

[1] 20 19
Code
qqp(vill.tough$ave, "lnorm") #best 

[1] 20 19
Code
shapiro.test(log(vill.tough$ave)) # could be normal (not not normal)

    Shapiro-Wilk normality test

data:  log(vill.tough$ave)
W = 0.97175, p-value = 0.7912
Code
wilcox.test(tough$ave ~ tough$species, paired = F) #W = 294, p-value = 0.000566 

    Wilcoxon rank sum exact test

data:  tough$ave by tough$species
W = 294, p-value = 0.000566
alternative hypothesis: true location shift is not equal to 0
Code
t.test(log(tough$ave) ~ tough$species, paired = F) # t = 3.6241, df = 33.24, p-value = 0.0009574

    Welch Two Sample t-test

data:  log(tough$ave) by tough$species
t = 3.6241, df = 33.24, p-value = 0.0009574
alternative hypothesis: true difference in means between group alle and group vill is not equal to 0
95 percent confidence interval:
 0.1166056 0.4148987
sample estimates:
mean in group alle mean in group vill 
          4.907907           4.642155 
Code
# both say the same thing... which is better to use in the article? 

Leaf chemistry

We have chemistry data for oxidative capacity and chemical concentrations for each of three compound classes: phenolics, flavonoids, and steroidal saponins (aka steroidal glycosides). Here, we compare chemical cconcentrations and oxidative capacity between species.

First, we read in and plot the data:

Code
# read in data and adjust column headers
chem <- read.csv('Full_chem_data.csv', header = T)
names(chem) <-  c("Accession", "Species", "DPPH", "Oxidative.Capacity", 
                  "Concentration.of.Phenolics", "Concentration.of.Flavonols",
                  "Concentration.of.NF.Phenolics", "Concentration.of.Saponins")       
# Plot Chemistry data
# Saponins
 S <- ggplot(chem, aes(x=Species, y=Concentration.of.Saponins)) +
  geom_boxplot(outlier.shape=NA, aes(fill = Species)) + 
  scale_fill_manual(values = c("#cc79a7", "#f0e442"), name = "Species") +
  ggtitle("Saponins") + 
  geom_jitter(alpha=0.6, width=0.15) +
  guides(fill="none") +
  theme(plot.title = element_text(hjust = 0.5)) + 
  xlab("") +
  ylab("Concentration of Saponins") + 
  #coord_cartesian(ylim = c(0, 1)) +
  scale_x_discrete(labels=c("ALLE" = expression(italic("C. allenii")), 
                            "VILL" = expression(italic("C. villosissimus")))) + 
  theme(axis.text.x = element_text(hjust = 0.6, colour = "black")) +
  geom_text(x='VILL', y=0.75, aes(label = "p<0.001"), size=3.5) + # y=0.99 for limited y
  theme(axis.text = element_text(size = 12))     

# Non-flavonoid Phenolics
P <- ggplot(chem, aes(x=Species, y=Concentration.of.NF.Phenolics)) +
  geom_boxplot(outlier.shape=NA, aes(fill = Species)) + 
  scale_fill_manual(values = c("#cc79a7", "#f0e442"), name = "Species") +
  ggtitle("Non-Flavonoid Phenolics") +
  geom_jitter(alpha=0.6, width=0.15) +
  guides(fill="none") +
  theme(plot.title = element_text(hjust = 0.5)) + 
  xlab("") +
  ylab("Concentration of non-Flavonoid Phenolics") + 
  #coord_cartesian(ylim = c(0, 1)) +
  scale_x_discrete(labels=c("ALLE" = expression(italic("C. allenii")), 
                            "VILL" = expression(italic("C. villosissimus")))) + 
  theme(axis.text.x = element_text(hjust = 0.6, colour = "black")) + 
  geom_text(x='VILL', y=0.8, aes(label = "p<0.01"), size=3.5) + # y=0.99 for limited y
  theme(axis.text = element_text(size = 12))   

# Flavonids
Fl <- ggplot(chem, aes(x=Species, y=Concentration.of.Flavonols)) +
  geom_boxplot(outlier.shape=NA, aes(fill = Species)) + 
  scale_fill_manual(values = c("#cc79a7", "#f0e442"), name = "Species",
                    labels=c("ALLE" = expression(italic("C. allenii")),
                             "VILL" = expression(italic("C. villosissimus")))) +
  ggtitle("Flavonoids") +
  geom_jitter(alpha=0.6, width=0.15) +
  theme(plot.title = element_text(hjust = 0.5), legend.text.align = 0) + 
  xlab("") +
  ylab("Concentration of Flavonoids") + 
  #coord_cartesian(ylim = c(0, 0.3)) +
  guides(fill="none") +
  scale_x_discrete(labels=c("ALLE" = expression(italic("C. allenii")), 
                            "VILL" = expression(italic("C. villosissimus")))) + 
  theme(axis.text.x = element_text(hjust = 0.6, colour = "black")) + 
  geom_text(x='ALLE', y=0.13, aes(label = "p<0.01"), size=3.5) + # y=0.299 for limited y and x='VILL'
  theme(axis.text = element_text(size = 12))   

# Oxidative capacity 
OC <- ggplot(chem, aes(x=Species, y=Oxidative.Capacity)) +
  geom_boxplot(outlier.shape=NA, aes(fill = Species)) + 
  scale_fill_manual(values = c("#cc79a7", "#f0e442"), name = "Species",
                    labels=c("ALLE" = expression(italic("C. allenii")),
                             "VILL" = expression(italic("C. villosissimus")))) +
  ggtitle("Oxidative Capacity") +
  geom_jitter(alpha=0.6, width=0.15) +
  theme(plot.title = element_text(hjust = 0.5), legend.text.align = 0) + 
  xlab("") +
  ylab("Oxidative Capacity (1-DPPH)") + 
  #coord_cartesian(ylim = c(0, 2.1)) +
  guides(fill="none") +
  scale_x_discrete(labels=c("ALLE" = expression(italic("C. allenii")), 
                            "VILL" = expression(italic("C. villosissimus")))) + 
  theme(axis.text.x = element_text(hjust = 0.6, colour = "black")) + 
  geom_text(x='VILL', y=2.0, aes(label = "p<0.01"), size=3.5) + # y=2.099 for limited y
  theme(axis.text = element_text(size = 12)) 

# Joining the plots with patchwork
 P + S + OC + Fl + plot_annotation(tag_levels = 'A', tag_suffix = ')') & 
  theme(plot.tag.position = c(0, 1),
        plot.tag = element_text(size = 11.5, hjust = 0, vjust = 0))

Code
##ggsave("Chemistry/Chemistry_plot", device = "png", width = 24, height = 20, units = "cm")
#ggsave("Chemistry_plot_free_scale.png", device = "png", width = 18, height = 20, units = "cm")

Next, we coonduct t-tests comparing values between species and correct for multiple tests.

Code
# assessing normality
# dividing the data by speceis to assess within species normality
# within group normality is the assumption of an unpaired students/welches t-test
alle_chem <- chem %>% filter(Species=="ALLE")
vill_chem <- chem %>% filter(Species=="VILL")

# # non-Flavonoid Phenolics 
# shapiro.test(alle_chem$Concentration.of.NF.Phenolics) # W = 0.9032, p-value = 0.3508
# qqp(alle_chem$Concentration.of.NF.Phenolics, "norm") 
# shapiro.test(vill_chem$Concentration.of.NF.Phenolics) # W = 0.94897, p-value = 0.7204
# qqp(vill_chem$Concentration.of.Phenolics_nf, "norm") 
# 
# # Flavonoids
# shapiro.test(alle_chem$Concentration.of.Flavonols) # W = 0.96981, p-value = 0.8971
# qqp(alle_chem$Concentration.of.Flavonols, "norm") 
# shapiro.test(vill_chem$Concentration.of.Flavonols) # W = 0.90093, p-value = 0.3367
# qqp(vill_chem$Concentration.of.Flavonols, "norm") 
# 
# # Saponins 
# shapiro.test(alle_chem$Concentration.of.Saponins) # W = 0.86689, p-value = 0.1743
# qqp(alle_chem$Concentration.of.Saponins, "norm") 
# shapiro.test(vill_chem$Concentration.of.Saponins) # W = 0.85908, p-value = 0.1486
# qqp(vill_chem$Concentration.of.Saponins, "norm") 
# 
# # Oxidative Capacity
# shapiro.test(alle_chem$Oxidative.Capacity) # W = 0.74073, p-value = 0.01014
# qqp(alle_chem$Oxidative.Capacity, "norm") # NOT normal
# hist(alle_chem$Oxidative.Capacity, breaks = 20)
# shapiro.test(vill_chem$Oxidative.Capacity) # W = 0.86363, p-value = 0.1631
# qqp(vill_chem$Oxidative.Capacity, "norm") 
# hist(vill_chem$Oxidative.Capacity, breaks = 20)

# Stats
# Phenolics
t.test(chem$Concentration.of.NF.Phenolics~chem$Species, alternative = "two.sided")

    Welch Two Sample t-test

data:  chem$Concentration.of.NF.Phenolics by chem$Species
t = 5.0247, df = 6.4714, p-value = 0.001919
alternative hypothesis: true difference in means between group ALLE and group VILL is not equal to 0
95 percent confidence interval:
 0.1997338 0.5662662
sample estimates:
mean in group ALLE mean in group VILL 
         0.5228095          0.1398095 
Code
# wilcox.test(chem$Concentration.of.Phenolics_nf~chem$Species) # qualitatively equvalent to t.test
# t = 5.0247, df = 6.4714, p-value = 0.001919

# Flavonids
t.test(chem$Concentration.of.Flavonols~chem$Species)

    Welch Two Sample t-test

data:  chem$Concentration.of.Flavonols by chem$Species
t = -3.9658, df = 7.7076, p-value = 0.004465
alternative hypothesis: true difference in means between group ALLE and group VILL is not equal to 0
95 percent confidence interval:
 -0.04589939 -0.01200537
sample estimates:
mean in group ALLE mean in group VILL 
         0.0800000          0.1089524 
Code
# wilcox.test(chem$Concentration.of.Flavonoids~chem$Species) # qualitatively equvalent to t.test
# t = -3.9658, df = 7.7076, p-value = 0.004465

# Saponins
t.test(chem$Concentration.of.Saponins~chem$Species)

    Welch Two Sample t-test

data:  chem$Concentration.of.Saponins by chem$Species
t = 5.5942, df = 11.886, p-value = 0.0001214
alternative hypothesis: true difference in means between group ALLE and group VILL is not equal to 0
95 percent confidence interval:
 0.1483432 0.3379426
sample estimates:
mean in group ALLE mean in group VILL 
         0.6570952          0.4139524 
Code
# wilcox.test(chem$Concentration.of.Saponins~chem$Species) # qualitatively equvalent to t.test
# t = 5.5942, df = 11.886, p-value = 0.0001214

# Oxidative activity
t.test(chem$Oxidative.Capacity~chem$Species)

    Welch Two Sample t-test

data:  chem$Oxidative.Capacity by chem$Species
t = 4.5951, df = 6.8692, p-value = 0.002624
alternative hypothesis: true difference in means between group ALLE and group VILL is not equal to 0
95 percent confidence interval:
 0.1444690 0.4532453
sample estimates:
mean in group ALLE mean in group VILL 
          2.015143           1.716286 
Code
# wilcox.test(chem$Oxidative.Capacity~chem$Species) # W = 49, p-value = 0.0005828 ; stronger result, but both <0.01. 
# t = 4.5951, df = 6.8692, p-value = 0.002624

# multiple test correction
p <- c(0.001919, 0.004465, 0.0001214, 0.002624)
p.adjust(p, method = "BH")
[1] 0.003498667 0.004465000 0.000485600 0.003498667
Code
# 0.003498667 0.004465000 0.000485600 0.003498667

We see that even with test correction for the four t-tests, chemical concentrations and oxidative capacity are all statistically different between species (p<0.01).

Beetle feeding trials

Palatability trials present beetles with a single plant species and measure how much they eat. Choice trials present beetles with both plant species simultaneously and measure how much they eat of each. In June 2019, and again in May and June 2022, we collected wild beetles from various species of Costus growing along creeks in the areas surrounding Pipeline road and recorded the species on which beetles were found. To ensure trial beetles weren’t biased towards one of our focal species, we did not collect beetles from either C. villosissimus or C. allenii. Beetles did not have access to leaves for 12 hours before we placed them in either palatability or choice trials. Each beetle was only used in a singel trial before release. To quantify herbivory, we laid a transparency printed with a mm2 grid over the leaf squares and counted mm2 of herbivore damage.

Palatability

For palatability trials, we placed a single beetle in a ramekin with a 1.5 cm2 square of either C. villosissimus or C. allenii leaf tissue and quantified the leaf area eaten after 12 hours. In 2019, we conducted 24 palatability trials for each plant species.

First, we import, clean, and plot the palatibility data:

Code
# importing beetle trial data
btd <- read.csv("beetle_trials.csv", header = TRUE) 

## palatability trial data
alleBT <- btd[btd$Species.in.trial =="alle",] # subsetting rows with alle as trial
villBT <- btd[btd$Species.in.trial =="vill",] # subsetting rows with vill as trial
alleBT$alle.mm2eaten <- as.numeric(as.character(alleBT$alle.mm2eaten))
villBT$vill.mm2eaten <- as.numeric(as.character(villBT$vill.mm2eaten))
# Reshaping the dataset
singleSPalle <- data.frame(alleBT$Species.found.on, alleBT$Species.in.trial, alleBT$alle.mm2eaten, alleBT$date)
names(singleSPalle) <- c("species.found.on", "trial.species", "mm2.eaten", "date")
singleSPvill <- data.frame(villBT$Species.found.on, villBT$Species.in.trial, villBT$vill.mm2eaten, villBT$date)
names(singleSPvill) <- c("species.found.on", "trial.species", "mm2.eaten", "date")
singleSP <- rbind(singleSPalle, singleSPvill)
# removing 0s (8 from alle, 9 from vill) (common in insect behavior to get 0s that are thrown out unless different between treatments)
singleSP.no0 <- singleSP[singleSP$mm2.eaten != 0,]

#simple box plot of palatability data, shows potential for slight trend towards more vill herbivory.
palatability <- ggplot(singleSP.no0, aes(trial.species, mm2.eaten)) + 
  geom_boxplot(outlier.shape=NA, aes(fill = trial.species)) + 
  ggtitle("Palatability Trials") + 
  geom_jitter(alpha=0.6, width=0.15) +
  scale_fill_manual(values = c("#cc79a7", "#f0e442"), name = "Species") +
  guides(fill="none") +
  theme(plot.title = element_text(hjust = 0.5)) +
  ylab('herbivory ('~mm^2*')') + 
  xlab("") +
  coord_cartesian(ylim = c(1, 43)) +
  scale_x_discrete(labels=c("alle" = expression(italic("C. allenii")), 
                            "vill" = expression(italic("C. villosissimus")))) + 
  theme(axis.text.x = element_text(hjust = 0.6, colour = "black")) +
  theme(axis.text = element_text(size = 12))   
palatability

Next, we evaluate whether beetles find C. villosissimus or C. allenii more palatable:

Code
# check out the distribution of the data:
hist(singleSP.no0$mm2.eaten, breaks = 50) # clearly not normal

Code
# looking at the fit of different distributions using QQ plots
par(mfrow=c(3,1))
qqp(singleSP.no0$mm2.eaten, "norm")
[1] 19 26
Code
qqp(singleSP.no0$mm2.eaten, "lnorm") # best fit 
[1] 19 26
Code
MASS::fitdistr(singleSP.no0$mm2.eaten,"Poisson")
    lambda  
  9.7741935 
 (0.5615127)
Code
qqPlot(singleSP.no0$mm2.eaten, distribution = "pois",lambda = 2) # okay fit 

[1] 19 26
Code
# regression model - log transformed response and date as a random effect
gBPT  <- lmer(log(singleSP.no0$mm2.eaten) ~ trial.species * (1 | date), data = singleSP.no0)  
summary(gBPT)
Linear mixed model fit by REML ['lmerMod']
Formula: log(singleSP.no0$mm2.eaten) ~ trial.species * (1 | date)
   Data: singleSP.no0

REML criterion at convergence: 90.6

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.56883 -0.48596  0.00364  0.76128  1.65852 

Random effects:
 Groups   Name        Variance Std.Dev.
 date     (Intercept) 0.2975   0.5455  
 Residual             0.9697   0.9847  
Number of obs: 31, groups:  date, 5

Fixed effects:
                  Estimate Std. Error t value
(Intercept)         1.5532     0.3679   4.221
trial.speciesvill   0.2756     0.4293   0.642

Correlation of Fixed Effects:
            (Intr)
trl.spcsvll -0.551
Code
# regression model - log transformed response and date as a fixed effect
gBPT2  <- lm(log(singleSP.no0$mm2.eaten) ~ trial.species * date, data = singleSP.no0)  
summary(gBPT2)

Call:
lm(formula = log(singleSP.no0$mm2.eaten) ~ trial.species * date, 
    data = singleSP.no0)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.54099 -0.54614 -0.00721  0.64679  1.63843 

Coefficients: (3 not defined because of singularities)
                              Estimate Std. Error t value Pr(>|t|)  
(Intercept)                     0.5973     0.5722   1.044   0.3069  
trial.speciesvill               0.4313     0.6648   0.649   0.5226  
date7/12/19                     2.2129     0.9047   2.446   0.0221 *
date7/13/19                     0.9437     0.6839   1.380   0.1803  
date7/6/19                      1.2582     0.7569   1.662   0.1095  
date7/9/19                      0.1662     1.0074   0.165   0.8703  
trial.speciesvill:date7/12/19  -0.8364     1.0472  -0.799   0.4323  
trial.speciesvill:date7/13/19       NA         NA      NA       NA  
trial.speciesvill:date7/6/19        NA         NA      NA       NA  
trial.speciesvill:date7/9/19        NA         NA      NA       NA  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.991 on 24 degrees of freedom
Multiple R-squared:  0.3318,    Adjusted R-squared:  0.1647 
F-statistic: 1.986 on 6 and 24 DF,  p-value: 0.1075
Code
# HOW DO I determine if I can leave out date as a random effect and just run a t-test? 

t.test(log(singleSP.no0$mm2.eaten) ~ trial.species, data = singleSP.no0) # t = -1.1442, df = 28.831, p-value = 0.2619

    Welch Two Sample t-test

data:  log(singleSP.no0$mm2.eaten) by trial.species
t = -1.1442, df = 28.831, p-value = 0.2619
alternative hypothesis: true difference in means between group alle and group vill is not equal to 0
95 percent confidence interval:
 -1.2311570  0.3479486
sample estimates:
mean in group alle mean in group vill 
          1.601304           2.042908 
Code
# # regression model - poisson 
# pBPT  <- glmer(mm2.eaten ~ trial.species + (1 | date), 
#                data = singleSP.no0, 
#                family = poisson)  
# summary(pBPT)

# evaluating model assumptions
# compare residuals
# log transformed
qqnorm(resid(gBPT)) 
qqline(resid(gBPT))
# # poisson
# qqnorm(resid(pBPT)) 
# qqline(resid(pBPT))
# # log looks a bit better

# looking for over/under dispersion
# log transformed
gsimOut <- simulateResiduals(gBPT)
plot(gsimOut) # looks good

Code
testDispersion(gsimOut) # p-value = 0.928

    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.92956, p-value = 0.928
alternative hypothesis: two.sided
Code
# # poisson
# psimOut <- simulateResiduals(pBPT)
# plot(psimOut) # no red flags but doesn't fit as well as log transformed
# testDispersion(psimOut) # p-value = 0.152
# # both don't set off warnings, but log transformed looks better.

Choice

For choice trials, we placed beetles in ramekins (again with wet paper and perforated lids) with one 1.5 cm2 leaf square of both C. villosissimus and C. allenii and quantified the leaf area eaten for each species after 12 hours. We conducted 13 successful choice trials in 2019 and 22 in 2022 (a trial was considered unsuccessful and removed if the beetle did not eat either leaf square).

Again, we first read in, clean, and plot the data. We will plot it alongside the palatability data for easy comparison.

Code
# read in the data
btd <- read.csv("beetle_trials.csv", header = TRUE) 
# subset rows with alle and vill together as trial - aka the choice trials
BCT <- btd[btd$Species.in.trial =="alle/vill",] 
# converting character to numeric
BCT$vill.mm2eaten <- as.numeric(BCT$vill.mm2eaten)
BCT$alle.mm2eaten <- as.numeric(BCT$alle.mm2eaten) 
# remove trials in which beetles did not eat either spp
BCT.no.0 <- BCT[,1:6] %>% filter(!(vill.mm2eaten == 0 & alle.mm2eaten == 0))
# remove trials in which beetles were collected from alle (focal species)
BCT.no.0 <- BCT.no.0 %>% filter(!(Species.found.on == "alle"))

# plotting
BCT.no.0$beetleID <- 1:nrow(BCT.no.0) # assign each trial (each beetle) an ID before splitting each trial into two rows (one for each species) 
# convert to long format
BCT.no.0.long <- BCT.no.0 %>% 
  pivot_longer(cols = c(vill.mm2eaten, alle.mm2eaten),
               names_to = "trial.spp",
               values_to = "mm2eaten")
# remove '.mm2eaten' from species column
BCT.no.0.long$trial.spp <- substr(BCT.no.0.long$trial.spp, 1, 4) 
# box plot of choice data 
choice <- ggplot(BCT.no.0.long, aes(trial.spp, mm2eaten)) + 
  geom_boxplot(outlier.shape=NA, aes(fill = trial.spp)) + 
  scale_fill_manual(values = c("#cc79a7", "#f0e442"), name = "Species",
                    labels=c("alle" = expression(italic("C. allenii")),
                             "vill" = expression(italic("C. villosissimus")))) +
  ggtitle("Choice Trials") + 
  geom_jitter(alpha=0.6, width=0.15) +
  theme(plot.title = element_text(hjust = 0.5), legend.text.align = 0) + 
  xlab("") +
  ylab(bquote("herbivore damage ("~mm^2~"eaten)")) + 
  guides(fill="none") +
  scale_x_discrete(labels=c("alle" = expression(italic("C. allenii")), 
                            "vill" = expression(italic("C. villosissimus")))) + 
  theme(axis.text.x = element_text(hjust = 0.6, colour = "black")) + 
  #geom_text(x='alle', y=33, aes(label = "p<0.05"), size=3.5) +
  theme(axis.text = element_text(size = 12))   

# plot palatability and choice feeding trials together
palatability + choice + plot_annotation(tag_levels = 'A', tag_suffix = ')') & 
  theme(plot.tag.position = c(0, 1),
        plot.tag = element_text(size = 11.5, hjust = 0, vjust = 0))

Code
#ggsave("RLB_feeding_trials_plot", device = "png", width = 13, height = 7, units = "cm")
#ggsave("beetle_choice_plot", device = "png", width = 5, height = 5, units = "cm")
#ggsave("beetle_choice_plot2", device = "png", width = 9, height = 6, units = "cm")

Now for the analysis! We first take the difference of mm2 eaten of each species for each beetle/trial to get a single preference value. We then evaluate the impact of a random effect, date, on that preference value.

Code
# take difference of mm2 eaten of each species for each beetle/trial to get a single preference value. (mm2 vill eaten - mm2 alle eaten)
BCT.no.0 <- BCT.no.0 %>% mutate(preference = vill.mm2eaten - alle.mm2eaten)
# assess data distribution
hist(BCT.no.0$preference, breaks=20) # looks normal with gaps haha, so not normal but likely from a normal distribution... I think. 

Code
# evaluate random effect of date on which trial was conducted (temp, humidity, etc of a particular day may impact beetle behavior)
# as fixed effect
BCT_lm <- lm(preference ~ date, data = (BCT.no.0))
summary(BCT_lm) # looks like date doesn't matter

Call:
lm(formula = preference ~ date, data = (BCT.no.0))

Residuals:
    Min      1Q  Median      3Q     Max 
-21.083  -6.292   0.000   4.583  22.000 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   -6.500      5.688  -1.143   0.2635  
date5/27/22    8.500      8.688   0.978   0.3369  
date6/16/22   10.500      9.852   1.066   0.2963  
date6/18/22   13.500      8.044   1.678   0.1053  
date6/19/22    7.833      8.688   0.902   0.3756  
date6/21/22    2.500      7.631   0.328   0.7458  
date7/1/22     8.500     12.718   0.668   0.5098  
date7/11/19    3.500     12.718   0.275   0.7853  
date7/9/19    12.583      6.568   1.916   0.0664 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.38 on 26 degrees of freedom
Multiple R-squared:  0.1927,    Adjusted R-squared:  -0.05565 
F-statistic: 0.7759 on 8 and 26 DF,  p-value: 0.6272
Code
# as random effect (just looking at variance)
BCT_lmer <- lmer(preference ~ (1|date), data = (BCT.no.0))
summary(BCT_lmer) # variance explained by date 2 orders of magnitude lower than residual variance :)
Linear mixed model fit by REML ['lmerMod']
Formula: preference ~ (1 | date)
   Data: (BCT.no.0)

REML criterion at convergence: 263.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.2678 -0.5058 -0.1254  0.3429  2.4385 

Random effects:
 Groups   Name        Variance Std.Dev.
 date     (Intercept)   5.923   2.434  
 Residual             117.246  10.828  
Number of obs: 35, groups:  date, 9

Fixed effects:
            Estimate Std. Error t value
(Intercept)    1.706      2.082   0.819

We see that, when treated as a fixed effect, no date has a significant (to the .05 level) impact on beetle preference. When treated as a random effect, the variance explained by date 2 orders of magnitude lower than residual variance. Together, this indicates date is not important to include when assessing preference. So, next we compare the mean amount eaten of each species within a trial with a paired t-test to evaluate whether beetles prefer one species over ther other in controlled trial conditions

Code
# now that I am convinced the random variable is not strongly influencing preference, I can compare amount eaten of each species within trial with a paired T-test. 
# paired t-test
t.test(BCT.no.0$vill.mm2eaten, BCT.no.0$alle.mm2eaten, paired = T) # t = 1.0992, df = 34, p-value = 0.2794

    Paired t-test

data:  BCT.no.0$vill.mm2eaten and BCT.no.0$alle.mm2eaten
t = 1.0992, df = 34, p-value = 0.2794
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -1.746155  5.860441
sample estimates:
mean difference 
       2.057143 

OKAY! So I think this way to analyze the data makes sense, but I need help figuring out how to talk about it. The p-value is definitely not ‘significant’ (0.28), but there is a trend in the data towards greater herbivory on vill than alle (as seen in the figure above). I also saw this in the other ways I analyzed the data. For example, in a bayesian framework, 80% or so of the posterior distribution supported beetles eating more vill than alle. How do I talk about this in the article to accurately represent the data?

Here is my best guess: Roleld leaf beetles, on average, ate more C. villosissimus than C. allenii. These trends were not significant at the 0.05 level, likely due to high variability in beetle behavior. However, the trend is present in both palatibility and choice trials, and aligns with chemistry and toughness data: beetles are expected to prefer the species with thinner, less chemically defended leaves. This pattern is also contradictory to the clearly greater beetle herbivory on C. allenii than C. villosissimus seen in the wild. To understand whether this contradiction is due to differing environmental conditions in the habitat of each species, we placed ‘herbivory arrays’ into intermediate habitat.

Herbivory arrays

Herbivory arrays consisted of a potted plant of each species (C. villosissimus and C. allenii) placed near a wild plant of a third species (C. scaber) that was colonized by at least one roleld leaf beetle. Arrays were placed in intermediate habitat along a creek on the seasonally dry side of the panamanian isthmus. It was not pooossible to accurately quantify the area eaten (area og beetle damage) after plants were in the field for 3 weeks, so we instead evaluate the proportion of healthy stems that experienced herbivory. (analysis of best guess estimates of area yeild qualitatively the same result)

Data importing, cleaning, and plotting:

Code
array <- read.csv("Herbivory_arrays_220703.csv", header = T)
names(array) <- c("ID", "array_num", "alt_ID","spp", "site", "date", "herbivory_yn", 
                  "num_RLs", "num_stms_herb", "lf1_herb", "lf1_area", "lf2_herb", "lf2_area", 
                  "lf3_herb", "lf3_area", "lf4_herb", "lf4_area", "lf5.herb", "lf5_area", 
                  "lf6.herbivory", "lf6.area", "lf7.herbivory", "lf7.area", "lf8.herbivory", 
                  "lf8.area", "lf9.herbivory", "lf9.area", "lf10.herbivory", "lf10.area", 
                  "lf11.herbivory", "lf11.area", "notes")

# Filter to only include census dates, select RL count cols, calculate proportion of rls with herbivory
array <- array %>% 
  dplyr::select(ID, array_num, spp, site, date, herbivory_yn, num_RLs, num_stms_herb) %>% # choose only certain columns to include
  filter(!is.na(num_stms_herb)) %>% # remove rows with no data in the 'num_stms_herb' column
  mutate(prop_rl_herb = num_stms_herb/num_RLs)

# plot
ggplot(array, aes(spp, prop_rl_herb)) + 
  geom_boxplot(aes(fill=spp)) + 
  ggtitle("Herbivory Arrays") +
  geom_jitter(alpha=0.6, width=0.15) +
  scale_fill_manual(values = c("#cc79a7", "#f0e442"), name = "Species",
                    labels=c("alle" = expression(italic("C. allenii")),
                             "vill" = expression(italic("C. villosissimus")))) +
  theme(plot.title = element_text(hjust = 0.5), legend.text.align = 0) + 
  xlab("") +
  guides(fill="none") +
  ylab("Proportion of stems with herbivory") + 
  scale_x_discrete(labels=c("alle" = expression(italic("C. allenii")), 
                            "vill" = expression(italic("C. villosissimus")))) + 
  theme(axis.text.x = element_text(angle = 10, hjust = 0.6, colour = "black"))

Code
# ggsave("arrays_lf_proportion_plot.png", device = "png", width = 6, height = 8, units = "cm")

Analyses:

Code
# regression model - beta distribution (proportion)
# add 0.00001 to 0 values and subtract it from 1 values to fit beta regression
array <- array %>% mutate(prop_rl_herb = case_when(
  prop_rl_herb == 0 ~ 0.00001,
  prop_rl_herb == 1 ~ 0.9999,
  TRUE ~ prop_rl_herb
))

# histogram of response
hist(array$prop_rl_herb, breaks=10)

Code
# regression model
AH <- glmmTMB(prop_rl_herb ~ spp + (1 | site) + (1 | date),
              data = array, 
              family=beta_family(link = "logit"))
summary(AH)  
 Family: beta  ( logit )
Formula:          prop_rl_herb ~ spp + (1 | site) + (1 | date)
Data: array

     AIC      BIC   logLik deviance df.resid 
  -434.6   -424.9    222.3   -444.6       47 

Random effects:

Conditional model:
 Groups Name        Variance  Std.Dev. 
 site   (Intercept) 2.560e-09 5.060e-05
 date   (Intercept) 3.966e-12 1.991e-06
Number of obs: 52, groups:  site, 11; date, 3

Dispersion parameter for beta family (): 0.321 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)   0.8201     0.2744   2.988  0.00281 **
sppvill      -0.4777     0.3749  -1.274  0.20267   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# evaluating model assumptions
# check residuals
qqnorm(resid(AH)) 
qqline(resid(AH)) # bad

Code
# looking for over/under dispersion
simOut <- simulateResiduals(AH)
plot(simOut) # BAD

Code
testDispersion(simOut) # 


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.80236, p-value = 0.096
alternative hypothesis: two.sided
Code
# regardless of model distribution, both random effects (site and date)
# explain almost no variation. The model also does not fit well 
# due to the odd distribution of the response variable,
# Attempting t-test below given that random variables appear unimportant 
# and the difference of amount eaten between species is roughly normal. 

# get proportion of rolled leaves in long format for paired t-test
dat <- array %>% pivot_wider(., id_cols=array_num, names_from = spp, values_from = prop_rl_herb)
# assessing whether difference is normally distributed for use of paired t-test
shapiro.test(dat$alle-dat$vill) # p-value = 0.0359; close to normal

    Shapiro-Wilk normality test

data:  dat$alle - dat$vill
W = 0.91582, p-value = 0.03591
Code
hist(dat$alle-dat$vill)

Code
qqp(dat$alle-dat$vill, "norm")

[1]  8 11
Code
# paired t-test
wilcox.test(dat$alle, dat$vill, paired = T) # V = 110, p-value = 0.111

    Wilcoxon signed rank test with continuity correction

data:  dat$alle and dat$vill
V = 111, p-value = 0.102
alternative hypothesis: true location shift is not equal to 0
Code
t.test(dat$alle, dat$vill, paired = T) # t = 1.6954, df = 25, p-value = 0.1024

    Paired t-test

data:  dat$alle and dat$vill
t = 1.6954, df = 25, p-value = 0.1024
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -0.04102641  0.42303616
sample estimates:
mean difference 
      0.1910049 
Code
# all results are qualitatively (and close to quantitatively) the same: no significant difference between species.
## NOTE - pick up in paper - explain as above - report t-test